AI-Based Sign Language Recognition System for Deaf and Hard-of-Hearing Individuals

Authors: Prof. Swati Gade, Yash Chaudhari, Neelam Koli, Pranav Sanas, Amruta Tilekar

DOI Link: https://doi.org/10.22214/ijraset.2023.52628

Abstract

Sign language is a vital form of communication for the deaf and hard-of-hearing community. The development of sign language recognition systems plays a crucial role in bridging the communication gap between hearing and non-hearing populations. This paper presents a Sign Language Recognition System (SLRS) built using the MediaPipe Hands model and web technologies such as HTML, CSS, and JavaScript. The system utilizes real-time hand tracking and gesture recognition to interpret sign language gestures and provides corresponding textual output. The proposed system demonstrates the potential for accessible and inclusive communication through the integration of machine learning and web-based technologies.

Introduction

I. INTRODUCTION

Sign language is a visual means of communication that uses hand gestures, facial expressions, and body movements to convey meaning. Sign language is the primary means of communication for millions of deaf people around the world, making it essential to develop technology that facilitates effective interaction between hearing and non-hearing individuals. This paper presents an innovative approach to sign language recognition by leveraging the MediaPipe Hands model and web technologies.

A hand gesture recognition system provides a natural, innovative and modern way of nonverbal communication. Our system detects various hand-knuckle coordinates using Mediapipe Hands Model and then passes the coordinates to KNN Classifier to classify the various hand signs into respective English language alphabets in real-time using a webcam. Mediapipe pre-trained model eliminates skin-tone related biases.

There are three types of sign language. Are as follows

Non-manual features: Tongue, facial expression, body pose, and hand gesture
Word level sign spelling: Each gesture presents a whole word.
Finger vocabulary: One gesture represents the alphabet/numbers. [2]

Our system focuses on the finger vocabulary signs, each Gesture /sign represents a unique English alphabet.

II. LITERATURE SURVEY

T. Petkar, Tanay Patil, A. Wadhankar, V. Chandore, V. Umate, and D. Hingnekar (2022) proposed a real-time sign language recognition system using OpenCV and YOLOv5 algorithm. Their system aimed to overcome the communication barrier between sign language users and verbal speakers. The advantages of their system included portability, user-friendly interface, and cost-effectiveness. However, a limitation was that their system only translated words received as input into sign language.[2]

Sahoo, Jaya Prakash, Allam Jaya Prakash, Pawe? P?awiak, and S. Samantray (2022) focused on real-time hand gesture recognition using a fine-tuned convolutional neural network (CNN). Their system aimed to develop a user-independent interface with high recognition performance. The proposed methodology included data acquisition, pre-processing techniques such as segmentation and filtering, and recognition of hand gestures using AlexNet and VGG-16. The advantages of their approach included improved segmentation with RGB-D sensors and real-time recognition. However, a limitation was the assumption that the hand is the closest object in front of the Kinect sensor.[3]

Abdullah Mujahid, Mazhar Javed Awan, Awais Yasin, Mazin Abed Mohammed, Robertas Damaševi?ius, Rytis Maskeli?nas, Karrar Hameed Abdulkareem (2021) proposed a lightweight hand gesture recognition model based on YOLOv3 and DarkNet-53 CNNs. Their system achieved high accuracy in complex environments and low-resolution images. The advantages included gesture recognition without additional preprocessing or image enhancement. However, a limitation was the requirement for high computation power.[4]

J. P. Singh, A. Gupta and Ankita (2020) researched hand gesture recognition for bridging the communication gap between speech-impaired individuals and traditional speakers. Their proposed methodology included a CNN for image identification and video-based methods using DTW and HMM for dynamic gesture classification. The advantages of their system included a high expected outcome of 0.95, which is suitable for widespread usage. However, a disadvantage was the requirement for large memory and computational resources for training with large datasets.[1]

A. Ojha, A. Pandey, S. Maurya, A. Thakur, Dr Dayananda P (2020) developed a real-time sign language translation system using a Convolutional Neural Network (CNN) and OpenCV. Their system aimed to capture sign gestures through a webcam, translate them into text and convert the text into speech. The advantages of their system included a 95% accuracy and the potential for extension to other sign languages with sufficient training data. However, a limitation was the need for gloves to eliminate variations in skin complexion and the possibility of incorrect predictions with bad gesture postures.[5]

In summary, this literature survey highlights different approaches to hand gesture recognition and sign language translation using techniques such as CNNs, video-based methods, and object detection algorithms. Each approach has its advantages and limitations, such as computational requirements, accuracy, and preprocessing considerations.

III. IMPLEMENTATION AND WORKING

The implementation of the SLRS involves the following components:

A. MediaPipe Hands Model:

MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. You can plug these solutions into your applications immediately, customize them to your needs, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open-source project, so you can further customize the solutions code to meet your application needs.

The MediaPipe Hands model is integrated into the web application using JavaScript libraries. This model provides real-time hand-tracking capabilities, enabling precise detection and tracking of hand movements.

We used hand world coordinates to represent the location of the hand landmarks. We then used the hand landmarks to extract features that represent the sign language gesture. We experimented with several feature extraction techniques, including the position and orientation of the fingers.

B. Gesture Recognition Model

ml5.js is an open-source, friendly high-level interface to TensorFlow.js, a library for handling GPU-accelerated mathematical operations and memory management for machine learning algorithms. We used ml5js to create our KNN model. The k-nearest neighbours (KNN) algorithm is a type of supervised ML algorithm that can be used for both classifications as well as regression predictive problems.

We trained the KNN classifier on a dataset of sign language gestures, which consisted of labelled examples of various sign language gestures. The dataset was created by us and included samples of sign language gestures commonly used in the deaf community. We extracted hand world coordinates from the video recordings of the gestures and labelled them with the corresponding sign language gesture.

C. Web Interface

The web interface is designed using HTML and CSS to provide a user-friendly experience. The interface includes a video feed that captures the user's hand movements and a text area to display the recognized sign language gestures.

D. Real-time Processing

Using JavaScript, the captured video frames from the webcam are processed in real time. Captured video frames go through the same feature extraction process used during training. The MediaPipe Hands model tracks hand movements, and the gesture recognition model (KNN Classifier) interprets the gestures to generate textual output.

E. Speech Synthesis

The Speech Synthesis (Text-to-Speech) interface of the Web Speech API is the controller interface for the speech service. This API helps in reading out signs/ words detected by the recognition system.

The system works as follows: An impaired person interacts with the system by showing hand gestures in front of a live camera. The captured images are processed using Mediapipe's hand landmark model, which detects and localizes the coordinates of 21 hand knuckles within the detected hand regions. These coordinates are then fed into a K-nearest neighbours (KNN) classifier. The KNN algorithm classifies the sign pose based on the input coordinates. The predicted sign pose is mapped to the corresponding English alphabet, which is displayed on the screen.

To create words, the user can perform a series of sign poses. Two special signs are included: one for inserting a space to separate words in the results, and another for deleting previously identified letters. To determine the correct letter, multiple frames from the camera are captured, and the predicted letters are stored in a fixed-length buffer or array. The most frequent letter in the buffer, determined by majority voting, is added to the results. This helps improve accuracy and reduces noise in the classification.

The combined pipeline involves the integration of the Mediapipe Hand model, KNN classification, majority voting, and buffer storage. This enables the system to recognize hand gestures, display corresponding alphabets, and facilitate word formation based on signed poses. The system provides real-time feedback to the impaired user, allowing them to communicate sign language gestures effectively.

The result can also be spoken using the Speech Synthesis interface (Speech API).

IV. OTHER SPECIFICATIONS

A. Advantages

No need for any external hardware or gloves
Can be easily extended to support other sign languages
Can run on multiple devices (Mobile, Laptop, Tablet)
No need for an extensive dataset for training
Easy-to-understand UI

B. Limitations

The performance and accuracy of the system can vary depending on the device's capabilities
Cannot perform word-level sign language recognition
The system may fail if there is direct lighting at the camera.
System's reliance on a single camera angle.

V. RESULTS AND SCREENSHOTS

Our system achieved an accuracy of 98% on a test set of sign language gestures. In the real-time video, the system accuracy might change. User feedback and user experience surveys were conducted to assess the system's usability and user satisfaction.

By leveraging the KNN algorithm, we demonstrated the effectiveness of the SLRS in bridging the communication gap between the hearing and non-hearing populations. The system provides a real-time interpretation of sign language gestures, enabling accessible and inclusive communication.

VI. FUTURE WORK

Further enhancements and future work can focus on expanding the gesture recognition model's vocabulary, optimizing the system's performance, and incorporating additional features to improve usability.

This system can be combined with another system that recognizes word-level sign gestures to make an extensive system that supports all types of signing. The current model has some limitations such as environmental factors like low lighting conditions and uncontrolled background which decrease the accuracy of the detection.

Future work could involve addressing these limitations, exploring alternative feature extraction methods or machine learning algorithms, or scaling the system to support multiple users.

VII. ACKNOWLEDGEMENT

We would like to express our sincere gratitude to our project guide - Prof. Swati Gade for her continuous support patient guidance, enthusiastic encouragement and valuable critiques, motivation, and immense knowledge. We would also like to thank her for her insightful comments and encouragement and for the tricky question which incited us to widen our research from various perspectives. She was a constant source of information to us during the entire journey. We consider ourselves fortunate to work under the guidance of such an eminent personality. Our sincere thanks also go to the whole computer engineering department staff members for the valuable information and feedback.

We are grateful for all the cooperation received during the period of our project. Finally, we would like to thank our family for supporting us spiritually during working on the project and our friends who gave us moral support and encouragement throughout the project study.

Conclusion

In this project, we demonstrated the effectiveness of using the KNN classifier and the Mediapipe hands library for real-time sign language recognition. Our system achieved high accuracy on a test set of sign language gestures and performed well in real-world conditions. The system has the potential to improve communication between deaf and non-deaf individuals and enhance accessibility for the deaf community. Additionally, using a custom dataset created by us highlights the potential for creating targeted, specific datasets for sign language recognition applications. The Sign Language Recognition System developed in this paper showcases the potential of combining the MediaPipe Hands model and web technologies to build an accessible and inclusive communication tool for the deaf and hard-of-hearing community. The system accurately interprets sign language gestures in real time and provides corresponding textual output, enabling effective communication between hearing and non-hearing individuals. Further enhancements and future work can focus on expanding the gesture recognition model\'s vocabulary, optimizing the system\'s performance, and incorporating additional features to improve usability.

References

[1] J. P. Singh, A. Gupta and Ankita, “Scientific Exploration of Hand Gesture Recognition to Text, ” 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 363-367, DOI: 10.1109/ICESC48915.2020.9155652 [2] Tanmay Petkar, Tanay Patil, Ashwini Wadhankar, Vaishnavi Chandore, Vaishnavi Umate, Dhanshri Hingnekar, “Real Time Sign Language Recognition System for Hearing and Speech Impaired People”, Ijraset Journal For Research in Applied Science and Engineering Technology (Volume 10 - Issue 4, April 2022) [3] Sahoo, Jaya Prakash, Allam Jaya Prakash, Pawe? P?awiak, and Saunak Samantray. 2022. \"Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network\" Sensors 22, no. 3: 706. https://doi.org/10.3390/s22030706 [4] Abdullah Mujahid, Mazhar Javed Awan, Awais Yasin, Mazin Abed Mohammed, Robertas Damaševi?cius, Rytis Maskeliunas and Karrar Hameed Abdulkareem, Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model, Appl. Sci. 2021. [5] Ankit Ojha, Ayush Pandey, Shubham Maurya, Abhishek Thakur, Dr Dayananda P, 2020, Sign Language to Text and Speech Translation in Real Time Using Convolutional Neural Network, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) NCAIT – 2020 (Volume 8 – Issue 15)

Copyright

Copyright © 2023 Prof. Swati Gade, Yash Chaudhari, Neelam Koli, Pranav Sanas, Amruta Tilekar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET52628

Publish Date : 2023-05-20

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here